Data Engineering

Forum Posts

Sorted by:

by shraddharane • New Contributor

08-17-2023 4:56:21 AM

1377 Views
2 replies
0 kudos

Migrating legacy SSAS cube to databricks

We have SQL database. Database is designed in star schema. We are migrating data from SQL to databricks. There are cubes designed using SSAS. These cubes are used for end users in excel for analysis purpose. We are now looking for solution for:1) Can...

Data Engineering

1377 Views
2 replies
0 kudos

08-17-2023 4:56:21 AM

View Replies

Latest Reply

Kaniz
Community Manager

08-17-2023 5:34:12 AM

0 kudos

Hi @shraddharane , 1) Can cubes be migrated? No, SSAS cubes cannot be directly migrated to Databricks. Databricks do not support the concept of multidimensional cubes like SSAS. Databricks is a Lakehouse architecture built on the foundation of Delta ...

0 kudos

08-17-2023 5:34:12 AM

1 More Replies

by 140015 • New Contributor III

10-19-2022 6:56:57 AM

940 Views
3 replies
1 kudos

Resolved! Using DLT pipeline with non-incremental data

Hi,I would like to know what you think about using the Delta Live Tables when the source for this pipeline is not incremental. What I mean by that is suppose that the data provider creates for me a new folder with files each time it has update to the...

Data Engineering

940 Views
3 replies
1 kudos

10-19-2022 6:56:57 AM

View Replies

Latest Reply

Joe_Suarez
New Contributor III

08-17-2023 4:20:21 AM

1 kudos

When dealing with B2B data building, the process of updating and managing your data can present unique challenges. Since your data updates involve new folders with files and you need to process the entire new folder, the concept of incremental proces...

1 kudos

08-17-2023 4:20:21 AM

2 More Replies

by GNarain • New Contributor II

11-15-2022 7:51:28 AM

3065 Views
12 replies
5 kudos

Resolved! Is there api call to set "Table access control" workspace config ?

Is there api call to set "Table access control" workspace config ?

Data Engineering

3065 Views
12 replies
5 kudos

11-15-2022 7:51:28 AM

View Replies

Latest Reply

Kaniz
Community Manager

08-02-2023 1:41:07 AM

5 kudos

Hi @GNarain, Here is an example of the API call: Could you try and let us know? POST /api/2.0/workspace/update{ "workspaceAccessControlEnabled": true} This API call will enable table access control for your workspace. You can make this API call u...

5 kudos

08-02-2023 1:41:07 AM

11 More Replies

by Eldar_Dragomir • New Contributor II

08-16-2023 3:59:38 PM

793 Views
1 replies
2 kudos

Resolved! Reprocessing the data with Auto Loader

Could you please provide me an idea how I can start reprocessing of my data? Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline. I want to reprocess the data + continue receive new data. What t...

Data Engineering

793 Views
1 replies
2 kudos

08-16-2023 3:59:38 PM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

08-16-2023 9:13:30 PM

2 kudos

@Eldar_Dragomir In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-...

2 kudos

08-16-2023 9:13:30 PM

by anarad429 • New Contributor

08-15-2023 10:23:50 AM

694 Views
1 replies
1 kudos

Resolved! Unity Catalog + Reading variable from external notebook

I am trying to run a notebook which reads some of its variables from and external notebook (I used %run command for that purpose), but it keeps giving me error that these variables are not defined. These sequences of notebooks run perfectly fine on a...

Data Engineering

694 Views
1 replies
1 kudos

08-15-2023 10:23:50 AM

View Replies

Latest Reply

Atanu
Esteemed Contributor

08-16-2023 8:38:23 PM

1 kudos

I think the issue here is the variable is not created until a value is assigned to it. So, you may need to assign a value to get_sql_schema

1 kudos

08-16-2023 8:38:23 PM

by NathanLaw • New Contributor III

08-11-2023 10:03:37 AM

412 Views
1 replies
0 kudos

CPU and GPU Elapse Runtimes

I have to 2 questions about elapsed job runtimes. The same Scoring notebook is run 3 times as 3 Jobs. The jobs are identical, same PetaStorm code, CPU cluster config ( not Spot cluster) and data but have varying elapsed runtimes. Elapsed runtimes...

Data Engineering

412 Views
1 replies
0 kudos

08-11-2023 10:03:37 AM

View Replies

Latest Reply

shyam_9
Valued Contributor

08-16-2023 5:33:28 PM

0 kudos

Hi @NathanLaw, Could you please confirm, if you have set any parameters for the best model? Is this stop after running some epochs if there is no improvement in the model performance?

0 kudos

08-16-2023 5:33:28 PM

by Sanjay_AMP • New Contributor II

08-11-2023 6:52:43 AM

397 Views
1 replies
1 kudos

Deployment-ready sample source-code for Delta Live Table & Autoloader

Hi all,We are planning to develop an Autoloader based DLT Pipeline that needs to beDeployable via a CI/CD PipelineObservableCan somebody please point me to source-code that we can start with a firm foundation instead of falling into a newbie-pattern ...

Data Engineering

397 Views
1 replies
1 kudos

08-11-2023 6:52:43 AM

View Replies

Latest Reply

Priyanka_Biswas
Valued Contributor

08-16-2023 3:38:47 PM

1 kudos

Hi @Sanjay_AMP Delta Live Tables and AutoLoader can be used together to incrementally ingest data from cloud object storage.• Python code example: - Define a table called "customers" that reads data from a CSV file in cloud object storage. - Define a...

1 kudos

08-16-2023 3:38:47 PM

by wojciech_jakubo • New Contributor III

06-21-2023 6:25:01 AM

4705 Views
7 replies
2 kudos

Question about monitoring driver memory utilization

Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...

Data Engineering

4705 Views
7 replies
2 kudos

06-21-2023 6:25:01 AM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

08-16-2023 12:59:57 PM

2 kudos

Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...

2 kudos

08-16-2023 12:59:57 PM

6 More Replies

by Thor • New Contributor III

07-31-2023 1:48:47 AM

2503 Views
1 replies
2 kudos

Resolved! Dynamically change spark.task.cpus

Hello,I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:spark.task.cpus 4 This setting is the thresh...

Data Engineering

2503 Views
1 replies
2 kudos

07-31-2023 1:48:47 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

08-16-2023 10:58:57 AM

2 kudos

Hi @Thor, Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark applicati...

2 kudos

08-16-2023 10:58:57 AM

by bharanireddy • New Contributor

08-15-2023 6:51:49 AM

1049 Views
1 replies
0 kudos

Resolved! Unable to access Data Engineer with Databricks V3 course

Hello, Since yesterday noon EST, the Data Engineering with Databricks V3 course is in maintenance mode. Can someone please help restore the access.Thank you,Bharani

Data Engineering

1049 Views
1 replies
0 kudos

08-15-2023 6:51:49 AM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

08-16-2023 9:02:19 AM

0 kudos

I believe you will have access now.

0 kudos

08-16-2023 9:02:19 AM

by ThomasVanBilsen • New Contributor III

08-15-2023 7:52:45 AM

1212 Views
1 replies
0 kudos

Resolved! Lineage graph now working.

Hey everyone,I've run the following code successfully:CREATE CATALOG IF NOT EXISTS lineage_data;CREATE SCHEMA IF NOT EXISTS lineage_data.lineagedemo;CREATE TABLE IF NOT EXISTS lineage_data.lineagedemo.menu ( recipe_id INT, app string, main ...

Data Engineering

1212 Views
1 replies
0 kudos

08-15-2023 7:52:45 AM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

08-16-2023 9:01:39 AM

0 kudos

I recommend you open a ticket to the support.

0 kudos

08-16-2023 9:01:39 AM

by lazcanja • New Contributor

08-16-2023 3:00:42 AM

812 Views
1 replies
1 kudos

Resolved! How to update table location with wasb to abfss

I created a table including location such as: wasb://<container>@<storageaccount>.blob.core.windows.net/foldername We have updated access to storage accounts to use abfssI am trying to execute the following command: alter table mydatabase.mytable set...

Data Engineering

812 Views
1 replies
1 kudos

08-16-2023 3:00:42 AM

View Replies

Latest Reply

Kaniz
Community Manager

08-16-2023 4:05:55 AM

1 kudos

Hi @lazcanja, The error message indicates an issue with the configuration value for the storage account key. The error might be due to an incorrect or invalid key. Given the information provided, you have correctly changed the configuration from spar...

1 kudos

08-16-2023 4:05:55 AM

by SamCallister • New Contributor II

11-22-2019 1:06:32 PM

13079 Views
8 replies
3 kudos

Dynamic Partition Overwrite for Delta Tables

Spark supports dynamic partition overwrite for parquet tables by setting the config: spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic") before writing to a partitioned table. With delta tables is appears you need to manually specif...

Data Engineering

13079 Views
8 replies
3 kudos

11-22-2019 1:06:32 PM

View Replies

Latest Reply

alijen
New Contributor II

08-16-2023 2:15:27 AM

3 kudos

@SamCallister wrote: Spark supports dynamic partition overwrite for parquet tables by setting the config:spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")before writing to a partitioned table. With delta tables is appears you need ...

3 kudos

08-16-2023 2:15:27 AM

7 More Replies

by AleksandraFrolo • New Contributor III

08-16-2023 12:54:38 AM

398 Views
0 replies
0 kudos

Web scraping with Databricks

Hello,What is the easiest way to do web scraping in Databricks? Let's imagine that from this link: http://automated.pythonanywhere.com , I need to grab this element "/html/body/div[1]/div/h1[1]" and return a text, how can I do it? Can somebody write ...

Data Engineering

398 Views
0 replies
0 kudos

08-16-2023 12:54:38 AM

by DatabricksPract • New Contributor II

08-14-2023 3:11:43 AM

3801 Views
2 replies
2 kudos

Resolved! Get metadata of tables in hive metastore

Hi team,I have a requirement to get the metadata of tables available in databricks hive metastore.Is there any way to get the metadata of all the tables inspite of looping through tables using Describe table_name.As hive metastore doesnot support inf...

Data Engineering

3801 Views
2 replies
2 kudos

08-14-2023 3:11:43 AM

View Replies

Latest Reply

DatabricksPract
New Contributor II

08-15-2023 7:08:22 PM

2 kudos

@Tharun-Kumar - Thanks for your quick reply, it worked.

2 kudos

08-15-2023 7:08:22 PM

1 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Migrating legacy SSAS cube to databricks

Resolved! Using DLT pipeline with non-incremental data

Resolved! Is there api call to set "Table access control" workspace config ?

Resolved! Reprocessing the data with Auto Loader

Resolved! Unity Catalog + Reading variable from external notebook

CPU and GPU Elapse Runtimes

Deployment-ready sample source-code for Delta Live Table & Autoloader

Question about monitoring driver memory utilization

Resolved! Dynamically change spark.task.cpus

Resolved! Unable to access Data Engineer with Databricks V3 course

Resolved! Lineage graph now working.

Resolved! How to update table location with wasb to abfss

Dynamic Partition Overwrite for Delta Tables

Web scraping with Databricks

Resolved! Get metadata of tables in hive metastore

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...