Data Engineering

Forum Posts

Sorted by:

by kseyser • Visitor

9 hours ago

16 Views
0 replies
0 kudos

Predicting compute required to run Spark jobs

Im working on a project to predict compute (cores) required to run spark jobs. Has anyone work on this or something similar before? How did you get started?

Data Engineering

16 Views
0 replies
0 kudos

9 hours ago

by Maatari • New Contributor II

10 hours ago

18 Views
0 replies
0 kudos

DataBricks Auto loader vs input source files deletion detection

Hi, While ingesting files from a source folder continuously, I would like to be able to detect the case where files are being deleted. As far as I can tell the Autoloader can not handle the detection of files deleted in the source folder. Hence the c...

Data Engineering

18 Views
0 replies
0 kudos

10 hours ago

by AmnBrt • New Contributor

yesterday

51 Views
0 replies
0 kudos

"Databricks Accredited Lakehouse Fundamentals" Badge not received.

Hello, so today I watched the tutorial videos and passed the knowledge test as requested to earn the "Databricks Accredited Lakehouse Fundamentals" Badge. Instead I received the "Certificate of Completion of Fundamentals of the Databricks Lakehouse P...

Data Engineering

51 Views
0 replies
0 kudos

yesterday

by shadowinc • New Contributor

yesterday

73 Views
0 replies
0 kudos

spark/databricks temporary views and uuid

Hi All,We have a table which has an id column generated by uuid(). For ETL we use databricks/spark sql temporary views. we observed strange behavior between databricks sql temp view (create or replace temporary view) and spark sql temp view (df.creat...

Data Engineering

Databricks SQL

spark sql

temporary views

uuid

73 Views
0 replies
0 kudos

yesterday

by as999 • New Contributor III

04-02-2022 8:59:09 AM

7433 Views
8 replies
6 kudos

Databrick hive metastore location?

In databrick, where is hive metastore location is it control plane or data plane? for prod systems In terms of security what preventions should be taken to secure hive metastore?

Data Engineering

7433 Views
8 replies
6 kudos

04-02-2022 8:59:09 AM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

05-18-2022 2:58:00 AM

6 kudos

@as999 The default metastore is managed by Databricks. If you are concerned about security and would like to have your own metastore you can go for the external metastore setup. You have the details steps in the below doc for setting up the external...

6 kudos

05-18-2022 2:58:00 AM

7 More Replies

by MarkusFra • New Contributor II

03-22-2024 5:38:42 AM

1109 Views
3 replies
0 kudos

Re-establish SparkSession using Databricks connect after cluster restart

Hello,when developing locally using Databricks connect how do I re-establish the SparkSession when the Cluster restarted? getOrCreate() seems to get the old invalid SparkSession even after Cluster restart instead of creating a new one or am I missing...

Data Engineering

databricks-connect

1109 Views
3 replies
0 kudos

03-22-2024 5:38:42 AM

View Replies

Latest Reply

Michael_Chein
New Contributor

Friday

0 kudos

If anyone encounters this problem, the solution that worked for me was to restart the Jupyter kernel.

0 kudos

Friday

2 More Replies

by dbengineer516 • New Contributor

Friday

124 Views
1 replies
0 kudos

/api/2.0/preview/sql/queries API only returning certain queries

Hello,When using /api/2.0/preview/sql/queries to list out all available queries, I noticed that certain queries were being shown while others were not. I did a small test on my home workspace, and it was able to recognize certain queries when I defin...

Data Engineering

124 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

brockb
New Contributor III

Friday

0 kudos

Hi,How many queries were returned in the API call in question? The List Queries documentation describes this endpoint as supporting pagination with a default page size of 25, is that how many you saw returned? Query parameters page_size integer <= 10...

0 kudos

Friday

by prabhu26 • New Contributor

Friday

98 Views
1 replies
0 kudos

Unable to enforce schema on data read from jsonl file in Azure Databricks using pyspark

I'm tring to build a ETL pipeline in which I'm reading the jsonl files from the azure blob storage, then trying to transform and load it to delta tables in databricks. I have created the below schema for loading my data : schema = StructType([ S...

Data Engineering

98 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

DataEngineer
New Contributor II

Friday

0 kudos

Try this.Add option("multiline","true")

0 kudos

Friday

by mh_db • New Contributor II

Friday

82 Views
0 replies
0 kudos

Unable to connect to oracle server from databricks notebook in AWS

I'm trying to connect to oracle server hosted in azure from AWS databricks notebook but seems the connection keeps timing out. I tested the connection IP using telnet <hostIP> 1521 command from another EC2 instance and that seems to reach the oracle ...

Data Engineering

AWS

oracle

TCP

82 Views
0 replies
0 kudos

Friday

by DataEngineer • New Contributor II

Friday

62 Views
0 replies
0 kudos

AWS Email sending challenge from Databricks with UNITY CATALOG and Multinode cluster

Hi,I have implemented the UNITY CATALOG with multinode cluster in databricks. The workspace instance profile with EC2 access is also created in IAM. but still having a challenge in sending emails from databricks using SES service.The same is working ...

Data Engineering

62 Views
0 replies
0 kudos

Friday

by MarkD • New Contributor II

a week ago

440 Views
8 replies
0 kudos

SET configuration in SQL DLT pipeline does not work

Hi,I'm trying to set a dynamic value to use in a DLT query, and the code from the example documentation does not work.SET startDate='2020-01-01'; CREATE OR REFRESH LIVE TABLE filtered AS SELECT * FROM my_table WHERE created_at > ${startDate};It is g...

Data Engineering

Delta Live Tables

dlt

sql

440 Views
8 replies
0 kudos

a week ago

View Replies

Latest Reply

Hkesharwani
Contributor

Friday

0 kudos

Hi @MarkD ,You may use set variable_name.var= '1900-01-01'to set the value of variable and in order to use the value of variable use ${automated_date.var} Example: set automated_date.var= '1800-01-01' select * from my table where date = CAST(${autom...

0 kudos

Friday

7 More Replies

by pshuk • New Contributor III

Monday

154 Views
2 replies
1 kudos

upload file/table to delta table using CLI

Hi,I am using CLI to transfer local files to Databricks Volume. At the end of my upload, I want to create a meta table (storing file name, location, and some other information) and have it as a table on databricks Volume. I am not sure how to create ...

Data Engineering

154 Views
2 replies
1 kudos

Monday

View Replies

Latest Reply

Ayushi_Suthar
Honored Contributor

Thursday

1 kudos

Hi @pshuk , Greetings! We understand that you are looking for a CLI command to create a Table but at this moment Databricks doesn't support CLI command to create the table but you can use SQL Execution API -https://docs.databricks.com/api/workspace/...

1 kudos

Thursday

1 More Replies

by JOFinancial • New Contributor

Friday

66 Views
1 replies
0 kudos

No Data for External Table from Blob Storage

Hi All,I am trying to create an external table from a Azure Blob storage container. I recieve no errors, but there is no data in the table. The Blob Storage contains 4 csv files with the same columns and about 10k rows of data. Am I missing someth...

Data Engineering

66 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Hkesharwani
Contributor

Friday

0 kudos

Hi, The code looks completely fine. please check if you have any other delimiter other than , .If your CSV files use a different delimiter, you can specify it in the table definition using the OPTIONS clause.Just to confirm I created a sample table a...

0 kudos

Friday

by TinasheChinyati • New Contributor

12-08-2023 11:29:40 PM

1786 Views
2 replies
0 kudos

Is databricks capable of housing OLTP and OLAP?

Hi data experts.I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requ...

Data Engineering

1786 Views
2 replies
0 kudos

12-08-2023 11:29:40 PM

View Replies

Latest Reply

ChrisCkx
New Contributor II

Friday

0 kudos

Hi @Kaniz I have looked at this topic extensively and have even tried to implement it.I am a champion of databricks at my organization, but I do not think that it currently enables the OLTP scenarios.The closest I have gotten to it is by using the St...

0 kudos

Friday

1 More Replies

by dbal • New Contributor III

a week ago

525 Views
2 replies
0 kudos

withColumnRenamed does not work with databricks-connect 14.3.0

I am not able to run our unit tests suite due a possible bug in the databricks-connect library. The problem is with the Dataframe transformation withColumnRenamed. When I run it in a Databricks cluster (Databricks Runtime 14.3 LTS), the column is ren...

Data Engineering

525 Views
2 replies
0 kudos

a week ago

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

Thursday

0 kudos

@dbal - can you please try withColumnsRenamed() instead Reference: https://docs.databricks.com/en/release-notes/dbconnect/index.html#databricks-connect-1430-python

0 kudos

Thursday

1 More Replies

User

Count

1604

737

344

284

247

Databricks

Forum Posts

Predicting compute required to run Spark jobs

DataBricks Auto loader vs input source files deletion detection

"Databricks Accredited Lakehouse Fundamentals" Badge not received.

spark/databricks temporary views and uuid

Databrick hive metastore location?

Re-establish SparkSession using Databricks connect after cluster restart

/api/2.0/preview/sql/queries API only returning certain queries

Unable to enforce schema on data read from jsonl file in Azure Databricks using pyspark

Unable to connect to oracle server from databricks notebook in AWS

AWS Email sending challenge from Databricks with UNITY CATALOG and Multinode cluster

SET configuration in SQL DLT pipeline does not work

upload file/table to delta table using CLI

No Data for External Table from Blob Storage

Is databricks capable of housing OLTP and OLAP?

withColumnRenamed does not work with databricks-connect 14.3.0

External table from external location

How to increase executor memory in Databricks jobs

Databricks job keep getting failed due to executor...

Set up connection to on prem sql server

Git Integration with Databricks Query Files and Az...