Data Engineering

Forum Posts

Sorted by:

by omsas • New Contributor

10-15-2021 4:48:38 AM

1661 Views
2 replies
0 kudos

How to add Columns for Automatic Fill on Pandas Python

1. I have data x,I would like to create a new column with the condition that the value are 1, 2 or 32. The name of the column is SHIFT where this SHIFT column will be filled automatically if the TIME_CREATED column meets the conditions.3. the conditi...

Data Engineering

1661 Views
2 replies
0 kudos

10-15-2021 4:48:38 AM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

10-15-2021 12:59:20 PM

0 kudos

You an do something like this in pandas. Note there could be a more performant way to do this too. import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,2,3,4]}) df.head() > a > 0 1 > 1 2 > 2 3 > 3 4 conditions = [(df['a'] <=2...

0 kudos

10-15-2021 12:59:20 PM

1 More Replies

by SQLArchitect • New Contributor

10-15-2021 8:33:08 AM

1009 Views
1 replies
1 kudos

Writing Records Failing Constraint Requirements to Separate Table when using Delta Live Tables

Are there any plans / capabilities in place or approaches people are using for writing (logging) records failing constraint requirements to separate tables when using Delta Live Tables? Also, are there any plans / capabilities in place or approaches ...

Data Engineering

1009 Views
1 replies
1 kudos

10-15-2021 8:33:08 AM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

10-15-2021 12:49:57 PM

1 kudos

According to the language reference documentation, I do not believe quarantining records is possible right now out of the box. But there are a few workarounds under the current functionality. Create a second table with the inverse of the expectations...

1 kudos

10-15-2021 12:49:57 PM

by Kamal2 • New Contributor II

09-23-2021 1:37:09 AM

9670 Views
3 replies
2 kudos

Resolved! PDF Parsing in Notebook

I have pdf files stored in azure adls.i want to parse pdf files in pyspark dataframeshow can i do that ?

Data Engineering

9670 Views
3 replies
2 kudos

09-23-2021 1:37:09 AM

View Replies

Latest Reply

User16752240003
Contributor

10-15-2021 8:31:23 AM

2 kudos

If you have familiarity with Scala you can use Tika. Tika is a wrapper around PDFBox. In case you want to use it in Databricks I suggest you to go through this blog and Git repo. For python based codes you may want to use PyPDF2 as a pandas UDF in S...

2 kudos

10-15-2021 8:31:23 AM

2 More Replies

by Kaniz • Community Manager

09-23-2021 12:46:15 AM

690 Views
1 replies
0 kudos

Resolved! How to reset AUTO_INCREMENT in MySQL?

Data Engineering

690 Views
1 replies
0 kudos

09-23-2021 12:46:15 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-15-2021 1:54:25 AM

0 kudos

ALTER TABLE tablename AUTO_INCREMENT = 1

0 kudos

10-15-2021 1:54:25 AM

by MudassarA • New Contributor II

07-29-2019 6:07:11 PM

10874 Views
4 replies
1 kudos

Resolved! How to fix TypeError: init() got an unexpected keyword argument 'max_iter'?

# Create the model using sklearn (don't worry about the parameters for now): model = SGDRegressor(loss='squared_loss', verbose=0, eta0=0.0003, max_iter=3000) Train/fit the model to the train-part of the dataset: odel.fit(X_train, y_train) ERROR: Typ...

Data Engineering

10874 Views
4 replies
1 kudos

07-29-2019 6:07:11 PM

View Replies

Latest Reply

Fantomas_nl
New Contributor II

08-13-2019 3:55:31 AM

1 kudos

Replacing max_iter with n_iter resolves the error. Thnx! It is a bit unusual to expect errors like this with this type of solution from Microsoft. As if it could not be prevented..

1 kudos

08-13-2019 3:55:31 AM

3 More Replies

by Artem_Yevtushen • New Contributor III

10-13-2021 5:45:06 PM

1155 Views
1 replies
2 kudos

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Show all distinct values per column in dataframeProblem Statement:I want to see all the distinct values per column for my entire table, but a SQL query with a collect_set() on every column is not dynamic and too long to write.Use this code to show th...

Data Engineering

1155 Views
1 replies
2 kudos

10-13-2021 5:45:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-14-2021 10:25:19 AM

2 kudos

@Artem Yevtushenko - This is great! Thank you for sharing!

2 kudos

10-14-2021 10:25:19 AM

by aimas • New Contributor III

10-12-2021 4:45:24 PM

3786 Views
8 replies
5 kudos

Resolved! error creating tables using UI

Hi, i try to create a table using UI, but i keep getting the error "error creating table <table name> create a cluster first" even when i have a cluster alread running. what is the problem?

Data Engineering

3786 Views
8 replies
5 kudos

10-12-2021 4:45:24 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-13-2021 2:13:29 AM

5 kudos

Be sure that cluster is selected (arrow in database) and at least there is Default database.

5 kudos

10-13-2021 2:13:29 AM

7 More Replies

by Orianh • Valued Contributor II

10-14-2021 1:59:31 AM

13941 Views
11 replies
10 kudos

Resolved! Read JSON files from the s3 bucket

Hello guys, I'm trying to read JSON files from the s3 bucket. but no matter what I try I get Query returned no result or if I don't specify the schema I get unable to infer a schema.I tried to mount the s3 bucket, still not works.here is some code th...

Data Engineering

13941 Views
11 replies
10 kudos

10-14-2021 1:59:31 AM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

10-14-2021 3:42:37 AM

10 kudos

Please refer to the doc that helps you to read JSON. If you are getting this error the problem should be with the JSON schema. Please validate it.As a test, create a simple JSON file (you can get it on the internet), upload it to your S3 bucket, and ...

10 kudos

10-14-2021 3:42:37 AM

10 More Replies

by Data_Bricks1 • New Contributor III

10-13-2021 11:47:18 AM

1911 Views
7 replies
0 kudos

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...

Data Engineering

1911 Views
7 replies
0 kudos

10-13-2021 11:47:18 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-14-2021 3:48:17 AM

0 kudos

for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)

0 kudos

10-14-2021 3:48:17 AM

6 More Replies

by StephanieRivera • Valued Contributor II

10-13-2021 2:55:42 PM

977 Views
1 replies
5 kudos

Resolved! Are there data types that are not good in Delta format? Does Delta handle images, audio, and video?

Data Engineering

977 Views
1 replies
5 kudos

10-13-2021 2:55:42 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-14-2021 1:08:02 AM

5 kudos

Hi as it is transaction tables (there are history commits and snapshot). I would not store there images or videos as it can be saved few times and you will have high storage costs, it can also be slow when data is big.I would definitely store images,...

5 kudos

10-14-2021 1:08:02 AM

by Data_Bricks1 • New Contributor III

10-13-2021 12:00:45 PM

452 Views
1 replies
0 kudos

BLOB multi container data loading

Data Engineering

452 Views
1 replies
0 kudos

10-13-2021 12:00:45 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-13-2021 12:31:37 PM

0 kudos

@Rajeswari Gummadi - Is this a duplicate of your other thread? I don't see any content and I want to make sure all your questions are answered.

0 kudos

10-13-2021 12:31:37 PM

by User16826994223 • Honored Contributor III

06-25-2021 9:45:03 AM

3658 Views
2 replies
1 kudos

AssertionError: assertion failed: Unable to delete the record but I am able to select it though

Is there any reason this command works well:%sql SELECT * FROM datanase.table WHERE salary > 1000returning 2 rows, while the below:%sql delete FROM datanase.table WHERE salary > 1000ErrorError in SQL statement: AssertionError: assertion failed:...

Data Engineering

3658 Views
2 replies
1 kudos

06-25-2021 9:45:03 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 9:45:49 AM

1 kudos

DELETE FROM (and similarly UPDAT. aren't supported on the Parquet files - right now on Databricks, it's supported for Delta format. You can convert your parquet files into delta using CONVERT TO DELTA, and then this command will work for you.

1 kudos

06-25-2021 9:45:49 AM

1 More Replies

by dataslicer • Contributor

09-27-2021 4:16:50 PM

5596 Views
4 replies
4 kudos

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

Running Azure Databricks Enterprise DBR 8.3 ML running on a single node, with Python notebook. I have 2 small Spark dataframes that I am able source via credential passthrough reading from ADLSgen2 via `abfss://` method and display the full content ...

Data Engineering

5596 Views
4 replies
4 kudos

09-27-2021 4:16:50 PM

View Replies

Latest Reply

Dan_Z
Honored Contributor

10-12-2021 1:41:59 PM

4 kudos

Modern Spark operates by a design choice to separate storage and compute. So saving a csv to the river's local disk doesn't make sense for a few reasons:the worker nodes don't have access to the driver's disk. They would need to send the data over to...

4 kudos

10-12-2021 1:41:59 PM

3 More Replies

by PaulHernandez • New Contributor II

11-28-2019 11:13:37 AM

14865 Views
7 replies
0 kudos

Resolved! How to show an image in a notebook using html?

Hi everyone, I just learning how to personalize the databricks notebooks and would like to show a logo in a cell. I installed the databricks cli and was able to upload the image file to the dbfs: I try to display it like this: displayHTML("<im...

Data Engineering

14865 Views
7 replies
0 kudos

11-28-2019 11:13:37 AM

View Replies

Latest Reply

_robschaper
New Contributor II

10-12-2021 9:39:09 AM

0 kudos

@Paul Hernandez @Sean Owen @Navneet Tuteja I solved this after I also ran into the same issue where my notebook suddenly wouldn't show an image sitting on the driver in an accessible folder - no matter what I was trying in the notebook the display...

0 kudos

10-12-2021 9:39:09 AM

6 More Replies

by daindana • New Contributor III

09-29-2021 10:20:17 PM

6338 Views
4 replies
4 kudos

Resolved! Why doesn't my notebook display widgets when I use 'dbutils' while it is displayed with '%sql CREATE WIDGET'?

The widget is not shown when I use dbutils while it works perfect with sql.For example, %sql CREATE WIDGET TEXT state DEFAULT "CA"This one shows me widget.dbutils.widgets.text("name", "Brickster", "Name") dbutils.widgets.multiselect("colors", "oran...

Data Engineering

6338 Views
4 replies
4 kudos

09-29-2021 10:20:17 PM

View Replies

Latest Reply

daindana
New Contributor III

10-11-2021 4:45:54 PM

4 kudos

Hello, Ryan! For some reason, this problem is solved, and now it is working perfectly! I did nothing new, but it is just working now. Thank you!:)

4 kudos

10-11-2021 4:45:54 PM

3 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

How to add Columns for Automatic Fill on Pandas Python

Writing Records Failing Constraint Requirements to Separate Table when using Delta Live Tables

Resolved! PDF Parsing in Notebook

Resolved! How to reset AUTO_INCREMENT in MySQL?

Resolved! How to fix TypeError: init() got an unexpected keyword argument 'max_iter'?

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Resolved! error creating tables using UI

Resolved! Read JSON files from the s3 bucket

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

Resolved! Are there data types that are not good in Delta format? Does Delta handle images, audio, and video?

BLOB multi container data loading

AssertionError: assertion failed: Unable to delete the record but I am able to select it though

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

Resolved! How to show an image in a notebook using html?

Resolved! Why doesn't my notebook display widgets when I use 'dbutils' while it is displayed with '%sql CREATE WIDGET'?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...

Resolved! PDF Parsing in Notebook

Resolved! How to reset AUTO_INCREMENT in MySQL?

Resolved! How to fix TypeError: __init__() got an unexpected keyword argument 'max_iter'?

Resolved! error creating tables using UI

Resolved! Read JSON files from the s3 bucket

Resolved! Are there data types that are not good in Delta format? Does Delta handle images, audio, and video?

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

Resolved! How to show an image in a notebook using html?

Resolved! Why doesn't my notebook display widgets when I use 'dbutils' while it is displayed with '%sql CREATE WIDGET'?

Resolved! How to fix TypeError: init() got an unexpected keyword argument 'max_iter'?