Data Engineering

Forum Posts

Sorted by:

by Nick_Hughes • New Contributor III

05-16-2023 3:43:03 AM

10034 Views
3 replies
1 kudos

Best way to generate fake data using underlying schema

HiWe are trying to generate fake data to run our tests. For example, we have a pipeline that creates a gold layer fact table form 6 underlying source tables in our silver layer. We want to generate the data in a way that recognises the relationships ...

Data Engineering

10034 Views
3 replies
1 kudos

05-16-2023 3:43:03 AM

View Replies

Latest Reply

RonanStokes_DB
Databricks Employee

07-13-2023 11:14:14 AM

1 kudos

Hi @Nick_Hughes This may be late for your scenario - but hopefully others facing similar issues will find it useful.You can specify how data is generated in `dbldatagen` using rules in the data generation spec. If rules are specified for data generat...

1 kudos

07-13-2023 11:14:14 AM

2 More Replies

by AnuVat • New Contributor III

02-02-2023 5:19:47 PM

41754 Views
7 replies
13 kudos

Resolved! How to read data from a table into a dataframe outside of Databricks environment?

Hi, I am working on an ML project and I need to access the data in tables hosted in my Databricks cluster through a notebook that I am running locally. This has been very easy while I run the notebooks in Databricks but I cannot figure out how to do ...

Data Engineering

41754 Views
7 replies
13 kudos

02-02-2023 5:19:47 PM

View Replies

Latest Reply

chakri
New Contributor III

04-28-2023 10:04:36 PM

13 kudos

We can use Apis and pyodbc to achieve this. Once go through the official documentation of databricks that might be helpful to access outside of the databricks environment.

13 kudos

04-28-2023 10:04:36 PM

6 More Replies

by NimaiAhl • New Contributor II

09-08-2022 10:24:11 PM

1336 Views
1 replies
0 kudos

External Tables - SQL

To create external tables we need to use the location keyword and use the link for the storage location, in reference to that does the user need to have permission for the storage location if not then will we use storage credentials to provide the ac...

Data Engineering

1336 Views
1 replies
0 kudos

09-08-2022 10:24:11 PM

View Replies

Latest Reply

Shikamaru
Databricks Employee

04-14-2023 12:23:22 AM

0 kudos

Hi Nimai, That's partially right. You can grant permissions directly on the storage credential, but Databricks recommends that you reference it in an external location and grant permissions to that instead. An external location combines a storage cre...

0 kudos

04-14-2023 12:23:22 AM

by DBX-Beginer • New Contributor

02-06-2023 12:39:53 PM

4718 Views
2 replies
0 kudos

Display count of records in all tables in hive meta store based on one of the column value.

I have a DB name called Test in Hive meta store of data bricks. This DB contains around 100 tables. Each table has the column name called sourcesystem and many other columns. Now I need to display the count of records in each table group by source sy...

Data Engineering

4718 Views
2 replies
0 kudos

02-06-2023 12:39:53 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 1:58:58 AM

0 kudos

Hi @Krish K Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

04-10-2023 1:58:58 AM

1 More Replies

by Upendra_Kumar • New Contributor

02-02-2023 1:47:21 AM

1734 Views
3 replies
0 kudos

Not able to perform update in delta table in databricks using 3 tables

Hi,I am able to perform merge from 2 tables but have requirement to update table based on 3 tables like following query.update a set a.name=b.namefrom table1 a inner join table2 b on a.id=b.idinner join table3 c on a.id=c.idThanks in advance..

Data Engineering

1734 Views
3 replies
0 kudos

02-02-2023 1:47:21 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 11:16:33 PM

0 kudos

Hi @upendra kumar sharma Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!Thanks and Regards

0 kudos

04-08-2023 11:16:33 PM

2 More Replies

by RafaelGomez61 • New Contributor

03-30-2023 11:16:52 AM

3554 Views
2 replies
0 kudos

Can't access delta tables under SQL Warehouse cluster. Getting Error while using path .../_delta_log/000000000.checkpoint

In our Databricks workspace, we have several delta tables available in the hive_metastore catalog. we are able to access and query the data via Data Science & Engineering persona clusters with no issues. The cluster have the credential passthrough en...

Data Engineering

3554 Views
2 replies
0 kudos

03-30-2023 11:16:52 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 7:10:58 PM

0 kudos

Hi @Rafael Gomez Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

0 kudos

03-31-2023 7:10:58 PM

1 More Replies

by shiva12494 • New Contributor II

03-14-2023 8:44:46 AM

4755 Views
2 replies
2 kudos

Issue with reading exported tables stored in parquet

Hi All, I am exported all tables from postgres snapshot into S3 in parquet format. I am trying to read the table using databricks and i am unable to do so. I get the following error: "Unable to infer schema for Parquet. It must be specified manually....

Data Engineering

4755 Views
2 replies
2 kudos

03-14-2023 8:44:46 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-25-2023 3:43:08 AM

2 kudos

Hi @shiva charan velichala Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that bes...

2 kudos

03-25-2023 3:43:08 AM

1 More Replies

by SRK • Contributor III

01-23-2023 3:20:31 AM

3349 Views
5 replies
0 kudos

Delta Live Tables data quality rules application.

I have a requirement, where I need to apply inverse DQ rule on a table to track the invalid data. For which I can use the following approach:import dltrules = {}quarantine_rules = {}rules["valid_website"] = "(Website IS NOT NULL)"rules["valid_locatio...

Data Engineering

3349 Views
5 replies
0 kudos

01-23-2023 3:20:31 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-23-2023 5:06:33 AM

0 kudos

You can get additional info from DLT event log which is in delta so you can load it as table https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html#data-quality

0 kudos

01-23-2023 5:06:33 AM

4 More Replies

by Databrickguy • New Contributor II

01-31-2023 9:38:37 AM

2983 Views
2 replies
2 kudos

How to check/list the tables which has CDF enabled?

How to list all the tables which has CDF enabled?I can review a table to find out if CDF is enable with below code.SHOW TBLPROPERTIES tableA(delta.enableChangeDataFeed)The return key ...

Data Engineering

2983 Views
2 replies
2 kudos

01-31-2023 9:38:37 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

02-23-2023 2:46:27 PM

2 kudos

Hhi @Tim zhang,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

2 kudos

02-23-2023 2:46:27 PM

1 More Replies

by Mahesh777k • New Contributor

01-10-2023 5:08:54 PM

2620 Views
2 replies
2 kudos

How to delete duplicate tables?

Hi Everyone,Accidently imported duplicate tables, guide me how to delete themusing data bricks community edition

Data Engineering

2620 Views
2 replies
2 kudos

01-10-2023 5:08:54 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

01-11-2023 6:30:06 AM

2 kudos

Hi @Mahesh Babu Uppala You can use the following method to delete only the duplicate tables%scala val tables = spark.sql("""SHOW TABLES""").createOrReplaceTempView("tables") val temp_tables = spark.sql("""select tableName from tables where tableName...

2 kudos

01-11-2023 6:30:06 AM

1 More Replies

by Spauk • New Contributor II

01-03-2023 5:38:28 AM

19854 Views
5 replies
7 kudos

Resolved! Best Practices for naming Tables and Databases in Databricks

We moved in Databricks since few months from now, and before that we were in SQL Server.So, all our tables and databases follow the "camel case" rule.Apparently, in Databricks the rule is "lower case with underscore".Where can we find an official doc...

Data Engineering

19854 Views
5 replies
7 kudos

01-03-2023 5:38:28 AM

View Replies

Latest Reply

LandanG
Databricks Employee

01-03-2023 7:09:24 AM

7 kudos

Hi @Salah KHALFALLAH , looking at the documentation it appears that Databricks' preferred naming convention is lowercase and underscores as you mentioned.The reason for this is most likely because Databricks uses Hive Metastore, which is case insens...

7 kudos

01-03-2023 7:09:24 AM

4 More Replies

by Viren123 • Contributor

11-15-2022 5:40:21 AM

5454 Views
5 replies
6 kudos

API to write into Databricks tables

Hello Experts,Is there any API in databricks that allows to write the data in the Databricks tables. I would like to send small size Logs information to Databricks tables from other service. What are my options?Thank you very much.

Data Engineering

5454 Views
5 replies
6 kudos

11-15-2022 5:40:21 AM

View Replies

Latest Reply

jneira
New Contributor III

11-17-2022 7:48:15 AM

6 kudos

and what about use the jdbc/odbc driver, either programatically or using a tool like dbeaver?

6 kudos

11-17-2022 7:48:15 AM

4 More Replies

by Kopal • New Contributor II

12-17-2022 5:17:10 PM

6053 Views
3 replies
3 kudos

Resolved! Data Engineering - CTAS - External Tables - Limitations of CTAS for external tables - can or cannot use options and location

Data Engineering - CTAS - External TablesCan someone help me understand why In chapter 3.3, we cannot not directly use CTAS with OPTIONS and LOCATION to specify delimiter and location of CSV?Or I misunderstood?Details:In Data Engineering with Databri...

Data Engineering

6053 Views
3 replies
3 kudos

12-17-2022 5:17:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

12-18-2022 12:15:00 PM

3 kudos

The 2nd statement CTAS will not be able to parse the csv in any manner because it's just the from statement that points to a file. It's more of a traditional SQL statement with select and from. It will create a Delta Table. This just happens to b...

3 kudos

12-18-2022 12:15:00 PM

2 More Replies

by same213 • New Contributor III

10-18-2022 8:44:36 AM

5384 Views
4 replies
8 kudos

Is it possible to create a sqlite database and export it?

I am trying to create a sqlite database in databricks and add a few tables to it. Ultimately, I want to export this using Azure. Is this possible?

Data Engineering

5384 Views
4 replies
8 kudos

10-18-2022 8:44:36 AM

View Replies

Latest Reply

same213
New Contributor III

12-13-2022 5:04:54 AM

8 kudos

@Hubert Dudek We currently have a process in place that reads in a SQLite file. We recently transitioned to using Databricks. We were hoping to be able to create a SQLite file so we didn't have to alter the current process we have in place.

8 kudos

12-13-2022 5:04:54 AM

3 More Replies

by xiangzhu • Contributor III

11-21-2022 7:53:37 AM

5106 Views
3 replies
2 kudos

Could jobs do everything delta live tables do ?

Hello,I've read the posts:Jobs - Delta Live tables difference (databricks.com)andDifference between Delta Live Tables and Multitask Jobs (databricks.com)My understanding is that delta live tables are more like a DSL that simplfies the workflow defini...

Data Engineering

5106 Views
3 replies
2 kudos

11-21-2022 7:53:37 AM

View Replies

Latest Reply

xiangzhu
Contributor III

11-21-2022 1:53:40 PM

2 kudos

@Landan George "Jobs won't be able to do what DLT does", I read some blogs, and watched some videos too, but I still cannot figure out the difference between jobs vs DLT. Does it mean without Databricks DLT, Databricks jobs cannot handle delta table...

2 kudos

11-21-2022 1:53:40 PM

2 More Replies