Data Engineering

Forum Posts

Sorted by:

by Chaitanya_Raju • Honored Contributor

01-26-2023 8:57:12 PM

1780 Views
4 replies
0 kudos

Creating new group

Can someone help me by providing steps for creating a new group, as I could not able to find it anywhere? Actually, I wanted to create a new group for Hyderabad, India which I could not able to find in the Groups sections.@Kaniz Fatma @Sujitha Ramam...

Data Engineering

1780 Views
4 replies
0 kudos

01-26-2023 8:57:12 PM

View Replies

Latest Reply

Kaniz
Community Manager

07-05-2023 1:48:32 AM

0 kudos

Hi, To request a Group be created, please fill out the form linked, and the Community Team will be in touch in 3 business days.

0 kudos

07-05-2023 1:48:32 AM

3 More Replies

by Siravich • New Contributor

07-10-2023 4:13:29 AM

341 Views
0 replies
0 kudos

Permission on Unity catalog

I am facing an issue when assign permission on view created on unity catalog. The problem is I had create a user defined function (UDFs) in order to encrypt sensitive column, I create a view which call the functions and source table within the catalo...

Data Engineering

341 Views
0 replies
0 kudos

07-10-2023 4:13:29 AM

by filipjankovic • New Contributor

07-10-2023 2:06:14 AM

2284 Views
0 replies
0 kudos

JSON string object with nested Array and Struct column to dataframe in pyspark

I am trying to convert JSON string stored in variable into spark dataframe without specifying schema, because I have a big number of different tables, so it has to be dynamically. I managed to do it with sc.parallelize, but since we are moving to Uni...

Data Engineering

2284 Views
0 replies
0 kudos

07-10-2023 2:06:14 AM

by glebex • New Contributor II

06-13-2023 5:56:16 AM

5349 Views
8 replies
8 kudos

Resolved! Accessing workspace files within cluster init script

Greetings all!I am currently facing an issue while accessing workspace files from the init script.As it was explained in the documentation, it is possible to place init script inside workspace files (link). This works perfectly fine and init script i...

Data Engineering

5349 Views
8 replies
8 kudos

06-13-2023 5:56:16 AM

View Replies

Latest Reply

jacob_hill_prof
New Contributor II

06-20-2023 1:44:01 PM

8 kudos

@Gleb Smolnik You might also want to try cloning a github repo in your init script and then storing dependencies like requirements.txt files and other init scripts there. By doing this you can pull a whole slew of init scripts to be utilized in your...

8 kudos

06-20-2023 1:44:01 PM

7 More Replies

by Raviiit • New Contributor II

07-09-2023 7:55:02 PM

2002 Views
4 replies
5 kudos

Resolved! spark managed tables

Hi, I recently started learning about spark. I was studying about spark managed tables. so as per docs " spark manages the both the data and metadata". Assume that i have a csv file in s3 and I read it into data frame like below.df = spark.read .for...

Data Engineering

2002 Views
4 replies
5 kudos

07-09-2023 7:55:02 PM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

07-09-2023 10:05:39 PM

5 kudos

Yes, @Raviiit DBFS (Databricks File System) is a distributed file system used by Databricks clusters. DBFS is an abstraction layer over cloud storage (e.g. S3 or Azure Blob Store), allowing external storage buckets to be mounted as paths in the DBFS ...

5 kudos

07-09-2023 10:05:39 PM

3 More Replies

by databicky • Contributor II

07-07-2023 9:41:21 PM

3163 Views
5 replies
0 kudos

File copy in adls

i am using dbutils.fs.copy(abfss://container/provsn/filen[ame.txt,abfss://container/data/sasam.txt)while.trying this copy method to copy the files it is showing urisyntax exception near the square bracket how can i read and copy it

Data Engineering

3163 Views
5 replies
0 kudos

07-07-2023 9:41:21 PM

View Replies

Latest Reply

dplante
Contributor II

07-09-2023 8:59:54 PM

0 kudos

From looking at stack trace, it looks like URIException. Easiest solution would be renaming the file so that there are no square brackets in the filename. If this is not an option, it might help to URLEncode the path - https://stackoverflow.com/que...

0 kudos

07-09-2023 8:59:54 PM

4 More Replies

by brickster • New Contributor II

10-30-2022 2:58:20 AM

2315 Views
3 replies
2 kudos

Passing values between notebook tasks in Workflow Jobs

I have created a Databricks workflow job with notebooks as individual tasks sequentially linked. I assign a value to a variable in one notebook task (ex: batchid = int(time.time()). Now, I want to pass this batchid variable to next notebook task.What...

Data Engineering

2315 Views
3 replies
2 kudos

10-30-2022 2:58:20 AM

View Replies

Latest Reply

fijoy
Contributor

07-09-2023 7:51:12 PM

2 kudos

@brickster You would use dbutils.jobs.taskValues.set() and dbutils.jobs.taskValues.get().See docs for more details: https://docs.databricks.com/workflows/jobs/share-task-context.html

2 kudos

07-09-2023 7:51:12 PM

2 More Replies

by Enzo_Bahrami • New Contributor III

05-30-2023 12:18:46 PM

3071 Views
6 replies
1 kudos

Resolved! On-Premise SQL Server Ingestion to Databricks Bronze Layer

Hello everyone!So I want to ingest tables with schemas from the on-premise SQL server to Databricks Bronze layer with Delta Live Table and I want to do it using Azure Data Factory and I want the load to be a Snapshot batch load, not an incremental lo...

Data Engineering

3071 Views
6 replies
1 kudos

05-30-2023 12:18:46 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-31-2023 8:18:18 PM

1 kudos

Hi @Parsa Bahraminejad Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

1 kudos

05-31-2023 8:18:18 PM

5 More Replies

by jhgorse • New Contributor III

07-07-2023 11:21:43 AM

744 Views
0 replies
0 kudos

mqtt to Delta Live Table

Greetings,I see that Delta Live Tables has various real-time connectors such as Kafka, Kinesis, Google's Pub Sub, and so on. I also see that Apache had maintained an mqtt connector to Spark through the 2.x series called Bahir, but dropped it in versi...

Data Engineering

744 Views
0 replies
0 kudos

07-07-2023 11:21:43 AM

by chorongs • New Contributor III

07-04-2023 12:52:38 AM

3264 Views
4 replies
3 kudos

Resolved! I have a question about the VACUUM feature!

History is piled up as aboveFor testing, I want to erase the history of the table with the VACUUM command."set spark.databricks.delta.retentionDurationCheck.After the option "enabled = False" was given, the command "VACUUM del_park retain 0 hours;" w...

Data Engineering

3264 Views
4 replies
3 kudos

07-04-2023 12:52:38 AM

View Replies

Latest Reply

Vinay_M_R
Valued Contributor II

07-05-2023 5:25:26 AM

3 kudos

Executing VACUUM performs garbage cleanup on the table directory. By default, a retention threshold of 7 days will be enforced. Please follow the below steps to perform VACCUM: 1.) SET spark.databricks.delta.retentionDurationCheck.enabled false; This...

3 kudos

07-05-2023 5:25:26 AM

3 More Replies

by kll • New Contributor III

07-06-2023 2:41:09 PM

535 Views
0 replies
0 kudos

SparkException: Job aborted due to stage failure when attempting to run grid_pointascellid

I am attempting to apply Mosaic's `grid_pointascellid` method on a spark dataframe with `lat`, `lon` columns.```import pyspark.sql.functions as F# Create a Spark DataFrame with a lat and lon columndf = spark.createDataFrame([("point1", 10.0, 20.0),("...

Data Engineering

535 Views
0 replies
0 kudos

07-06-2023 2:41:09 PM

by kll • New Contributor III

07-06-2023 2:39:31 PM

284 Views
0 replies
0 kudos

Mosaic's grid_boundary method returns inconsistent geometries

I am applying mosaic's `grid_boundary` method on a spark DataFrame containing a set of `h3_hex_ids`. The geometries returned are not consistent. i.e they could be either `lat, long` or `long, lat`.Here's a sample data```import pyspark.sql.functions a...

Data Engineering

geospatial

mosaic

284 Views
0 replies
0 kudos

07-06-2023 2:39:31 PM

by 442027 • New Contributor II

07-05-2023 6:41:07 PM

2561 Views
2 replies
3 kudos

Resolved! Delta Log checkpoints not being created?

It is mentioned in the delta protocol that checkpoints for delta tables are created every 10 commits - however when I modify a table after >10 separate operations (producing >10 separate json files in the _delta_log directory), no checkpoint files ar...

Data Engineering

2561 Views
2 replies
3 kudos

07-05-2023 6:41:07 PM

View Replies

Latest Reply

Vinay_M_R
Valued Contributor II

07-06-2023 2:21:17 AM

3 kudos

As the latest update now checkpointing of delta tables are created for every 100 commits. This is done for some improvement purpose.If you want to have a checkpoint file for delta table for every 10 commits or after any desired commits. You can cust...

3 kudos

07-06-2023 2:21:17 AM

1 More Replies

by Vsleg • Contributor

04-18-2023 5:27:44 AM

2309 Views
5 replies
3 kudos

Resolved! Issue with Apache Spark™ Programming with Databricks course

Hello,I found an issue with the Apache Spark™ Programming with Databricks courses on Databricks Academy when trying to do the labs. The mount that the courses use for training data is failing with what looks to me like an authentication issue (see sc...

Data Engineering

2309 Views
5 replies
3 kudos

04-18-2023 5:27:44 AM

View Replies

Latest Reply

Vsleg
Contributor

04-18-2023 5:44:42 AM

3 kudos

I found the course Git Repo at (https://github.com/databricks-academy/apache-spark-programming-with-databricks-english), this works so using that instead of the 'apache-spark-programming-with-databricks.dbc' file available in the learning portal. #DA...

3 kudos

04-18-2023 5:44:42 AM

4 More Replies

by ah0896 • New Contributor III

06-13-2023 11:29:25 AM

6623 Views
17 replies
10 kudos

Using init scripts on UC enabled shared access mode clusters

I know that UC enabled shared access mode clusters do not allow init script usage and I have tried multiple workarounds to use the required init script in the cluster(pyodbc-install.sh, in my case) including installing the pyodbc package as a workspa...

Data Engineering

6623 Views
17 replies
10 kudos

06-13-2023 11:29:25 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

07-01-2023 1:23:47 PM

10 kudos

@Anonymous @Kaniz can anyone form databricks confirm on above issue please, there seems to be bit conflict on using custom scripts support on shared access mode cluster with unity catalog enabled please

10 kudos

07-01-2023 1:23:47 PM

16 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Creating new group

Permission on Unity catalog

JSON string object with nested Array and Struct column to dataframe in pyspark

Resolved! Accessing workspace files within cluster init script

Resolved! spark managed tables

File copy in adls

Passing values between notebook tasks in Workflow Jobs

Resolved! On-Premise SQL Server Ingestion to Databricks Bronze Layer

mqtt to Delta Live Table

Resolved! I have a question about the VACUUM feature!

SparkException: Job aborted due to stage failure when attempting to run grid_pointascellid

Mosaic's grid_boundary method returns inconsistent geometries

Resolved! Delta Log checkpoints not being created?

Resolved! Issue with Apache Spark™ Programming with Databricks course

Using init scripts on UC enabled shared access mode clusters

Lowcode ETL in Databricks

Unity Catalog Metastore Details

DLT table not picked in python notebook

Load multiple delta tables at once from Sql server

Starting Serverless sql cluster on GCP