Data Engineering

Forum Posts

Sorted by:

by aimas • New Contributor III

10-12-2021 4:45:24 PM

3817 Views
8 replies
5 kudos

Resolved! error creating tables using UI

Hi, i try to create a table using UI, but i keep getting the error "error creating table <table name> create a cluster first" even when i have a cluster alread running. what is the problem?

Data Engineering

3817 Views
8 replies
5 kudos

10-12-2021 4:45:24 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-13-2021 2:13:29 AM

5 kudos

Be sure that cluster is selected (arrow in database) and at least there is Default database.

5 kudos

10-13-2021 2:13:29 AM

7 More Replies

by Orianh • Valued Contributor II

10-14-2021 1:59:31 AM

14092 Views
11 replies
10 kudos

Resolved! Read JSON files from the s3 bucket

Hello guys, I'm trying to read JSON files from the s3 bucket. but no matter what I try I get Query returned no result or if I don't specify the schema I get unable to infer a schema.I tried to mount the s3 bucket, still not works.here is some code th...

Data Engineering

14092 Views
11 replies
10 kudos

10-14-2021 1:59:31 AM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

10-14-2021 3:42:37 AM

10 kudos

Please refer to the doc that helps you to read JSON. If you are getting this error the problem should be with the JSON schema. Please validate it.As a test, create a simple JSON file (you can get it on the internet), upload it to your S3 bucket, and ...

10 kudos

10-14-2021 3:42:37 AM

10 More Replies

by Data_Bricks1 • New Contributor III

10-13-2021 11:47:18 AM

1927 Views
7 replies
0 kudos

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...

Data Engineering

1927 Views
7 replies
0 kudos

10-13-2021 11:47:18 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-14-2021 3:48:17 AM

0 kudos

for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)

0 kudos

10-14-2021 3:48:17 AM

6 More Replies

by StephanieRivera • Valued Contributor II

10-13-2021 2:55:42 PM

983 Views
1 replies
5 kudos

Resolved! Are there data types that are not good in Delta format? Does Delta handle images, audio, and video?

Data Engineering

983 Views
1 replies
5 kudos

10-13-2021 2:55:42 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-14-2021 1:08:02 AM

5 kudos

Hi as it is transaction tables (there are history commits and snapshot). I would not store there images or videos as it can be saved few times and you will have high storage costs, it can also be slow when data is big.I would definitely store images,...

5 kudos

10-14-2021 1:08:02 AM

by Data_Bricks1 • New Contributor III

10-13-2021 12:00:45 PM

458 Views
1 replies
0 kudos

BLOB multi container data loading

Data Engineering

458 Views
1 replies
0 kudos

10-13-2021 12:00:45 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-13-2021 12:31:37 PM

0 kudos

@Rajeswari Gummadi - Is this a duplicate of your other thread? I don't see any content and I want to make sure all your questions are answered.

0 kudos

10-13-2021 12:31:37 PM

by User16826994223 • Honored Contributor III

06-25-2021 9:45:03 AM

3681 Views
2 replies
1 kudos

AssertionError: assertion failed: Unable to delete the record but I am able to select it though

Is there any reason this command works well:%sql SELECT * FROM datanase.table WHERE salary > 1000returning 2 rows, while the below:%sql delete FROM datanase.table WHERE salary > 1000ErrorError in SQL statement: AssertionError: assertion failed:...

Data Engineering

3681 Views
2 replies
1 kudos

06-25-2021 9:45:03 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 9:45:49 AM

1 kudos

DELETE FROM (and similarly UPDAT. aren't supported on the Parquet files - right now on Databricks, it's supported for Delta format. You can convert your parquet files into delta using CONVERT TO DELTA, and then this command will work for you.

1 kudos

06-25-2021 9:45:49 AM

1 More Replies

by dataslicer • Contributor

09-27-2021 4:16:50 PM

5653 Views
4 replies
4 kudos

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

Running Azure Databricks Enterprise DBR 8.3 ML running on a single node, with Python notebook. I have 2 small Spark dataframes that I am able source via credential passthrough reading from ADLSgen2 via `abfss://` method and display the full content ...

Data Engineering

5653 Views
4 replies
4 kudos

09-27-2021 4:16:50 PM

View Replies

Latest Reply

Dan_Z
Honored Contributor

10-12-2021 1:41:59 PM

4 kudos

Modern Spark operates by a design choice to separate storage and compute. So saving a csv to the river's local disk doesn't make sense for a few reasons:the worker nodes don't have access to the driver's disk. They would need to send the data over to...

4 kudos

10-12-2021 1:41:59 PM

3 More Replies

by PaulHernandez • New Contributor II

11-28-2019 11:13:37 AM

14948 Views
7 replies
0 kudos

Resolved! How to show an image in a notebook using html?

Hi everyone, I just learning how to personalize the databricks notebooks and would like to show a logo in a cell. I installed the databricks cli and was able to upload the image file to the dbfs: I try to display it like this: displayHTML("<im...

Data Engineering

14948 Views
7 replies
0 kudos

11-28-2019 11:13:37 AM

View Replies

Latest Reply

_robschaper
New Contributor II

10-12-2021 9:39:09 AM

0 kudos

@Paul Hernandez @Sean Owen @Navneet Tuteja I solved this after I also ran into the same issue where my notebook suddenly wouldn't show an image sitting on the driver in an accessible folder - no matter what I was trying in the notebook the display...

0 kudos

10-12-2021 9:39:09 AM

6 More Replies

by daindana • New Contributor III

09-29-2021 10:20:17 PM

6413 Views
4 replies
4 kudos

Resolved! Why doesn't my notebook display widgets when I use 'dbutils' while it is displayed with '%sql CREATE WIDGET'?

The widget is not shown when I use dbutils while it works perfect with sql.For example, %sql CREATE WIDGET TEXT state DEFAULT "CA"This one shows me widget.dbutils.widgets.text("name", "Brickster", "Name") dbutils.widgets.multiselect("colors", "oran...

Data Engineering

6413 Views
4 replies
4 kudos

09-29-2021 10:20:17 PM

View Replies

Latest Reply

daindana
New Contributor III

10-11-2021 4:45:54 PM

4 kudos

Hello, Ryan! For some reason, this problem is solved, and now it is working perfectly! I did nothing new, but it is just working now. Thank you!:)

4 kudos

10-11-2021 4:45:54 PM

3 More Replies

by BorislavBlagoev • Valued Contributor III

09-24-2021 9:24:10 AM

3080 Views
5 replies
4 kudos

Resolved! Databricks writeStream checkpoint

I'm trying to execute this writeStream data_frame.writeStream.format("delta") \ .option("checkpointLocation", checkpoint_path) \ .trigger(processingTime="1 second") \ .option("mergeSchema", "true") \ .o...

Data Engineering

3080 Views
5 replies
4 kudos

09-24-2021 9:24:10 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-12-2021 6:46:15 AM

4 kudos

You can remove that folder so it will be recreated automatically.Additionally every new job run should have new (or just empty) checkpoint location.You can add in your code before running streaming:dbutils.fs.rm(checkpoint_path, True)Additionally you...

4 kudos

10-12-2021 6:46:15 AM

4 More Replies

by halfwind22 • New Contributor III

10-11-2021 1:42:37 AM

6680 Views
11 replies
12 kudos

Resolved! Unable to write csv files to Azure BLOB using pandas to_csv ()

I am using a Py function to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.My GET endpoint takes 2 query parameters,param1 and param2. So initially, I have a dataframe paramDf that has two columns param1 and ...

Data Engineering

6680 Views
11 replies
12 kudos

10-11-2021 1:42:37 AM

View Replies

Latest Reply

halfwind22
New Contributor III

10-12-2021 6:38:33 AM

12 kudos

@Hubert Dudek I cant issue a spark command to executor node, throws up an error ,because foreach distributes the processing.

12 kudos

10-12-2021 6:38:33 AM

10 More Replies

by ItsMe • New Contributor II

10-06-2021 11:58:11 PM

1683 Views
4 replies
7 kudos

Resolved! Run Pyspark job of Python egg package using spark submit on databricks

Error: missing application resourceGetting this error while running job with spark submit. I have given following parameters while creating job:--conf spark.yarn.appMasterEnv.PYSAPRK_PYTHON=databricks/path/python3--py-files dbfs/path/to/.egg job_m...

Data Engineering

1683 Views
4 replies
7 kudos

10-06-2021 11:58:11 PM

View Replies

Latest Reply

User16752246494
Contributor

10-11-2021 11:32:45 AM

7 kudos

Hi,We tried a simulate the question on our end and what we did was packaged a module inside a whl file.Now to access the wheel file we created another python file test_whl_locally.py. Inside test_whl_locally.py to access the content of the wheel file...

7 kudos

10-11-2021 11:32:45 AM

3 More Replies

by afshinR • New Contributor III

10-11-2021 5:24:46 PM

467 Views
1 replies
1 kudos

Hi, could you please help me with my question? i have not get any answers.

Hi,could you please help me with my question? i have not get any answers.

Data Engineering

467 Views
1 replies
1 kudos

10-11-2021 5:24:46 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-11-2021 7:50:16 PM

1 kudos

Hi @afshin riahi , Yes, Definitely I can help you with it.Please wait while I or someone from the community gets back with a response.Thank you for your patience .

1 kudos

10-11-2021 7:50:16 PM

by User16868770416 • Contributor

10-11-2021 3:56:47 PM

3198 Views
1 replies
0 kudos

What is the best way to decode protobuf using pyspark?

I am using spark structured streaming to read a protobuf encoded message from the event hub. We use a lot of Delta tables, but there isn't a simple way to integrate this. We are currently using K-SQL to transform into avro on the fly and then use Dat...

Data Engineering

3198 Views
1 replies
0 kudos

10-11-2021 3:56:47 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-11-2021 4:38:23 PM

0 kudos

hi @Will Block ,I think there is a related question being asked in the past. I think it was this one I found this library, I hope it helps.

0 kudos

10-11-2021 4:38:23 PM

by marchello • New Contributor III

09-04-2021 12:46:37 AM

3623 Views
9 replies
3 kudos

Resolved! error on connecting to Snowflake

Hi team, I'm getting weird error in one of my jobs when connecting to Snowflake. All my other jobs (I've got plenty) work fine. The current one also works fine when I have only one coding step (except installing needed libraries in my very first step...

Data Engineering

3623 Views
9 replies
3 kudos

09-04-2021 12:46:37 AM

View Replies

Latest Reply

Dan_Z
Honored Contributor

10-11-2021 2:18:01 PM

3 kudos

@marchello I suggest you contact Snowflake to move forward on this one.

3 kudos

10-11-2021 2:18:01 PM

8 More Replies

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Resolved! error creating tables using UI

Resolved! Read JSON files from the s3 bucket

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

Resolved! Are there data types that are not good in Delta format? Does Delta handle images, audio, and video?

BLOB multi container data loading

AssertionError: assertion failed: Unable to delete the record but I am able to select it though

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

Resolved! How to show an image in a notebook using html?

Resolved! Why doesn't my notebook display widgets when I use 'dbutils' while it is displayed with '%sql CREATE WIDGET'?

Resolved! Databricks writeStream checkpoint

Resolved! Unable to write csv files to Azure BLOB using pandas to_csv ()

Resolved! Run Pyspark job of Python egg package using spark submit on databricks

Hi, could you please help me with my question? i have not get any answers.

What is the best way to decode protobuf using pyspark?

Resolved! error on connecting to Snowflake

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...